Entrez: making use of its power.
نویسندگان
چکیده
Entrez 1 is a data retrieval system developed by the National Center for Biotechnology Information (NCBI) that provides integrated access to a wide range of data domains, including literature, nucleotide and protein sequences, complete genomes, three-dimensional structures, and more. Entrez includes powerful search features that retrieve not only the exact search results but also related records within a data domain that might not be retrieved otherwise and associated records across data domains. These features enable us to gather previously disparate pieces of an information puzzle for a topic of interest. Effective and powerful use of Entrez requires an understanding of the available data domains, the variety of data sources and types within each domain, and Entrez's advanced search features. This tutorial uses the human MLH1 gene, implicated in colon cancer, to demonstrate the wide variety of information that we can rapidly gather for a single gene. The numbers noted in the search results will of course change over time as the databases grow. The same techniques shown here can be used for any topic of interest. The search goals are to: • separate the wheat from the chaff – identifying a representative, well-annotated mRNA sequence record; • retrieve associated literature and protein records; • identify conserved domains within the protein; • identify similar proteins; • identify known mutations within the gene or protein; • find a resolved three-dimensional structure for the protein or, in its absence, identify structures with homologous sequence; • view genomic context and download the sequence region. An Entrez data domain usually encompasses data from several different source databases. The goal is to identify a representative, well-annotated mRNA sequence record among the many available in the Entrez Nucleotide data domain. The Entrez Nucleotide domain includes sequence records from the archival GenBank database, the curated Ref Seq 2 database, nucleotide sequences extracted from Protein Data Bank (PDB) 3 records, and a new Third-Party Annotation (TPA) database. As a result, an unrefined search can retrieve records of varying quality (in both sequence and annotation), and there can be a high degree of redundancy in search results, depending upon how many labs have submitted sequence data for a gene or its fragments. For example, an unqualified search of Entrez Nucleotide for colon cancer currently retrieves .10,000 hits. The results include archival and curated records, characterised sequences and
منابع مشابه
BioBrowsing: Making the Most of the Data Available in Entrez
One of the most popular ways to access public biological data is using portals, like Entrez (NCBI) which allows users to navigate through the data of 34 major biological sources following cross-references. In this process, data entries are inspected one after the other and crossreferences to additional data available in other sources may be followed. This navigational process may be time-consum...
متن کاملComplete genomes in WWW Entrez: data representation and analysis
MOTIVATION The large amount of genome sequence data now publicly available can be accessed through the National Center for Biotechnology Information (NCBI) Entrez search and retrieval system, making it possible to explore data of a breadth and scope exceeding traditional flatfile views. RESULTS Here we report recent improvements for completely sequenced genomes from viruses, bacteria, and yea...
متن کاملFast parsers for Entrez Gene
NCBI completed the transition of its main genome annotation database from Locuslink to Entrez Gene in Spring 2005. However, to this date few parsers exist for the Entrez Gene annotation file. Owing to the widespread use of Locuslink and the popularity of Perl programming language in bioinformatics, a publicly available high performance Entrez Gene parser in Perl is urgently needed. We present f...
متن کاملExclusionary Decision Making in Tehran Metropolitan Region- Complexity, Self organization and Power of Action
Viewing urban areas as webs of complex, interwoven networks, this article aims to analyze the decision-making process and its outcomes in Tehran metropolitan region. To do so, first the theoretical basis of complexity in urban life and its implications for planning have been reviewed. Using the main notion of power of action i.e. agency, and through creating the network of actors and their rela...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Briefings in bioinformatics
دوره 4 2 شماره
صفحات -
تاریخ انتشار 2003